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Abstract We consider estimation of a one-dim. location parameter by means of M- 
estimators S„ with monotone influence curve i//. For growing sample size n, on suit- 
ably thinned out convex contamination balls (3„ of shrinking radius r/ y/n about the 
ideal distribution, we obtain an expansion of the asymptotic maximal mean squared 
error MSE of form 

max nMSE(5„, Q„) = supi//^ + E,(A^ + ^ Aj + i A2 + o(i), 

where A\, are constants depending on \p and r. Hence S „ not only is uniformly 
(square) integrable in n (in the ideal model) but also on Q.„, which is not self-evident. 
For this result, the thinning of the neighborhoods, by a breakdown-driven, sample- 
wise restriction, is crucial, but exponentially negligible. Moreover, our results essen- 
tially characterize contaminations generating maximal MSE up to o(«"'). Our results 
are confirmed empirically by simulations as well as numerical evaluations of the risk. 
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1 Motivation/introduction 

In the setup of shrinking neighborhoods about a general, parametric ideal central 
model, Rieder [221 determines the asymptotically linear estimator (ALE) minimax- 
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ing as. MSE on these neighborhoods. We address the question to which degree this 
asymptotic optimality carries over to finite sample size and try to identify and quan- 
tify which aspects of both estimator and neighborhood are responsible for the quality 
of the approximation. 

1.1 Setup: one-dimensional location 

As a starting point for assessing such questions we consider the most basic parametric 
model of statistics, the one-dimensional location model {Pg(dx) = F{dx -9), e R) 
for some ideal distribution F with finite Fisher-Information of location 1(F) in the 
sense of Huber |14, 4.Def.4.1, Thm.4.2], i.e. 1(F) sup^^c'S f 'pdFfK J >f^ dF), 
entailing that A/ = -/// e L2(F), 1(F) = E[Aj]. ParalleHng Huber [HI, we also 
assume that A / is increasing. By translation equivariance, we may restrict ourselves 
to 9o - which is suppressed in the notation. 

The set of influence curves (IC's) W in this model is defined as in Rieder 1221 

iP:={(AeL2(^')l E[^] =0, E[<AA/] = 1), (1.1) 

where both expectations are evaluated under F. 

Shrinking neighborhoods Robust Statistics enlarges the ideal model assumptions by 
suitable neighborhoods about them. The shrinking neighborhood approach — compare 
e.g. Rieder [|22|, Kohl et al. lUTl . balances bias and variance, which would be of dif- 
ferent scaling in n otherwise, see also Ruckdeschel |24|. For this paper we consider 
contamination neighborhoods, i.e. the set Q„(r) of distributions 

xrc^i. ■ ■ • .^«) = Qn = -'±)p+'± (1-2) 
1=1 * ^ 

with r„ - min(r, ^|n), r > the contamination radius and P^'^. € M\(V) arbitrary, un- 
controllable contaminating distributions. As usual, we interpret Q„ as the distribution 
of the vector (X,),<„ with components 

Xi (1 - Ui)Xf + UjXf, = 1, . . . ,« (1.3) 

forXf, Ui, Xf stochastically independent, Xf F, t/,- Bin(l,r/ V«), and (Xf) ~ 
PI; for some arbitrary Pj; e A1i(B"). 

First order optimality For a sequence of estimators S„, consider as risk the asymp- 
totically (modified) maximal MSE on (3„ 

R(S„, r) := Hm lim sup | min{r, n\S„- 0oP) dQ„ (1.4) 

Following Rieder ll22l Ch. 5] a (suitably constructed) ALE S„ with IC tfr has risk 

R(Sn,r) ^r^ sup W^+E,,W^ (1.5) 
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By Theorem 5.5.7 (ibid.), together with its preceding remarks, for given r > 0, a 
(suitably constructed) ALE with IC f) minimizes R{- ,r) among all ALE's iff ?) = t/co 
for Lagrange multipliers z and A such that 77^.^ is an IC for 

77,„ =A(A/-z)min{l,co/|yl/-z|), E[(|A/ - z| - co)+] = r^co (1.6) 

Open issues in this setup Being bound to first order asymptotics, so far these results 
do not come along with an indication for the speed of the convergence; it is not clear 
to what degree radius r, sample size n and clipping height b affect this approxima- 
tion. The theorem only characterizes the optimal expansion in terms of ICs. Finally, 
modification ( |L4[ ) of the MSE, which is common in asymptotic statistics, cf. Le Cam 
ifTSl . Rieder ll22]| . Bickel et al. H, van der Vaart ll3T1l . and which forces the integrals 
to converge under weak convergence, appears somewhat ad hoc. One would perhaps 
prefer a modification that is statistically motivated. 



1.2 M-estimators for location 

There are several constructions for an ALE to achieve a given IC ifr — one-step con- 
structions, M-estimators, L-estimators and many more. In this paper we confine our- 
selves to M-estimators. We require i/r to be monotone and bounded and write (/',( ■ ) 
for ^( ■ - t). For technical reasons we assume that the set D, of discontinuities of the 
c.d.f. of ijJtiX"^) has to carry less mass than 1 uniformly: 

PD sup, F\D,) < 1 (1.7) 

Following the notation in Huber llT4l pp. 46], let 

5: :=sup{r| ^.A/te)>0}, 5r :=inf{r| 2^,(x,)<0} (L8) 

and 5„ be any estimator satisfying 5* < 5„ < 5". By monotonicity of ij/, we get 
Pr{5:<f) = Pr{J].Ar(x,)<0}, Pr{5,r <f)=Pr{ J]<Arte)<0} (L9) 

i<n i<n 

in the continuity points t of the LHS. The next lemma, an immediate consequence 
of Hall 1 10, Theorem 2.3], shows that we may ignore the event S*„ S" if we are 
interested in statements valid up to o(l/n). 

Lemma 1.1 Under ( fTTT] ), Pr(5* 5") = 0(exp(-rn))/or some 7 > 0. 

Remark 1.2 if U, D, = [±c\ for some o 0, Pr(5;, 5t S") = for « odd. 
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1.3 Organization of this paper and description of the results 

This paper provides answers to some of the open questions mentioned in subsec- 
tion [TT[ these answers were initiated by an attempt to check the validity of Rieder's 
asymptotic approach at finite sample sizes by simulations in 2003. At closer inspec- 
tion of these simulations, M. Kohl found out that larger inaccuracies of (first order) 
asymptotics only occurred in extraneous sample situations where more than half the 
sample size stemmed from a contamination, which made him conjecture that exclud- 
ing such samples, asymptotics might then prove useful even for very small samples. 
With regard to our shrinking setup, such an exclusion on the one hand is asymptoti- 
cally negligible, hence does not affect the results of subsection but on the other 
hand under this restriction indeed the unmodified MSE converges along with weak 
convergence. We discuss this modification in section |2] In section [3] we present the 



central theoretical result. Theorem 3.5 This result is of the following form 



sup n MSE(5„, e„) = r^ sup 1.^1^ +E + -^Ai + iA2+o(i) (1.10) 

e„6Q„(r;eo) 

Here S„ is an M-estimator to IC ip, and Aj, A2 are polynomials in the contamination 
radius r, in b - sup and in the moment functions f i-> E (/(J, Z = 1 , . . . , 4 and their 
derivatives evaluated in f = 0. We recognize at once that the speed of the convergence 
to the first order asymptotic value is one order faster in the ideal model. 

Notation 1.3 For indices we start counting witli 0, so that terms of first-order asymptotics have an 
index 0, second-order ones a 1 and so on. Also we abbreviate first-order, second-order and third-order by f- 
o, s-o, t-o respectively, and we write f-o-o, s-o-o, and t-0-0 for first, second, and third-order asymptotically 
optimal respectively. 

As to the correctness of our main result, we give a number of cross checks and com- 
ments on this result in section [4] The relevance of these results for (small) finite 
sample sizes is shown by a simulation study which is presented in section|5]as to its 
design and results. By means of an adopted convolution algorithm taken from Ruck- 
deschel and Kohl ll27l , we also compute numerically exact values of the MSE. Proofs 
are delegated to the appendix section [A] These contain rather tedious Taylor expan- 
sions where we need the help of a symbolic Algebra program like MAPLE. To ease 
readability, we therefore start the proof of the main theorem with an outline of the es- 
sential steps. Some auxiliary results needed in the proofs are provided in an appendix 
in sectioniB] 

On a web-page to this page, additional tables and figures, the MAPLE script to 
generate the expansions, and the R-script to calculate numerically exact MSE are 
available for download. 



2 Modification of tlie slirinking neigliborhood setup 

The key property in the shrinking-neighborhood setup is the LAN-propert}[^ in the 
sense of Hajek and LeCam. LAN holds for L2-differentiable models, c.f Rieder 1221 
Thm. 2.3.5]. and together with LeCam's third Lemma — c.f Cor 2.2.6 ibid. — implies 



for local asymptotic normality 
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uniform weak convergence of any (suitably constructed) ALE to a bounded IC on a 
representative subclass of the system of neighboring distributions <3„ — those distri- 
butions induced by simple perturbations Q„i(, t), see p. 126 (ibid.). 
Without additional assumptions, this weak convergence however does not carry over 
to convergence of the risk for an unbounded loss function in general, i.e. uniform 
integrability fails on any proper neighborhoods shrinking arbitrarily fast; which can 
be seen along the lines of Ruckdeschel ll25l Prop. 2.1]. 

Modification of the shrinking neighborhood setup We instead propose the following 
modification of the neighborhoods for finite n: Only realizations of U\,. . . ,Un are 
permitted, where 2 Ui < n/2. More precisely, accounting for non-symmetric i/r, we 
introduce 

^:=inf^, g^sup^, -b:^\{b-b), 5„ -M^ > (2.1) 

and recall that in our situation, both the functional (Huber lfT4l (2. 39), (2.40)]) and 
the finite sample (e-contamination) breakdown point (Donoho and Huber |6 section 
2.2]) of T respectively 5„ are 

£0 = l/(2 + (Jo) = sup|iA|/(^-^) (2.2) 

With these expressions, our modification amounts to considering the neighborhood 
system (3„(r; eo) of conditional distributions 

Q„ = £{[(! - UdX't + UiX^l I ^' - " - M (2.3) 

This restriction hence combines a restriction to the marginals i](X"'') which are 
"close" to £.{Xf) for each / as well as a sample-wise restriction. 
Correspondingly, we will consider the asymptotics of the unmodified MSE risk 

R„{S„,r-so):^ sup n f \S„-9o\^dQ, (2.4) 

Q„eQ„{r,sa) -J 

Asymptotic negligibility of this modification The effect of this modification is negli- 
gible asymptotically: By the HoefixJing bound ( |B.l| i, 

f (2 Ui > nso) < exp ( - 2n{so - r/ f) (2.5) 

which decays exponentially fast. Thus all results on convergence in law of the shrink- 
ing neighborhood setup are not affected when passing from Q„{r) to Q„(r, so). 

Remark 2.1 (a) Thinning out the neighborhoods is equally relevant for the interchange of integra- 
tion and maximization in the context of neighborhoods to a fixed radius e: Replacing r/ ^ by £, asymptotic 
negligibility )2.5) continues to hold, as long as £ < eq, while the failure of uniform integrability persists. 

(b) M-estimators have the well-known feature that in general the procedure with optimal efficiency 
[minimax MSE in our context] does not attain maximal breakdown point [works with minimally thinned 
out neighborhoods]; but just as already mentioned in Rousseeuw |23] and similarly as worked out in Yohai 
1 32 1, both goals may be achieved simultaneously combining a starting M-estimator of maximal breakdown 
point with a correction by a one-/i:-step construction with the f-o-o (or s-o-o, t-o-o) IC. 
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3 Main Theorem 

Notation To i/' : R — > R monotone let 4't(x) '■- i//{x - t) and i/r" :- t/r, - Ei/r, define 
the following functions 

L(t)^Eip„ y(f)' = E(<A?)^ p(f) = E(,A?)' W, 40 = E(^y y(fr' - 3 (3.1) 

Let y„ and })„ sequences in R such that for some y > 1 

iA(y„) = infiA + o(i), iA(y„) = sup.A + o(;;^) (3.2) 

For H e A1i(B") and an ordered set of indices / = (1 < i'l < . . . < ik < n) denote Hj 
the marginal of H with respect to /. 

Definition 3.1 Consider sequences c„, d„, and k„ in R, in (0, oo), and in {!,...,«), 
respectively. We say that (//*"') c Ali(B") is A-„-concentrated left [right] of c„ up to 
o(dn), if for each sequence of ordered sets I„ of cardinality i„ < k„ 

1 - //f ((-ex,; c„]'") = 0(fl'„) [ 1 - //f ((C„, cx,)'") = o{dn) \ (3.3) 

General assumptions in this paper 

(bmi) sup \\ijj\\-b<oo^ij/ monotone, iff e ¥ 
(D) For some 6 e (0, 1], L, V, p, and k from allow the expansions 

Lit) = /if + f2 + + ©(f^^-*), y(f) = yo(l + vi f + f') + 0(t^^') (3.4) 
p(f) = po + Pi f + 0(f'+*), Kit) = ^0 + ©(f-*) (3.5) 

(Vb) Vit) = 0(|fr<'+^>) for \t\ oo and some 6 e (0, 1] 
(C) Let fi be the characteristic function of \l/,iX'^); then 

lim limsup sup |/,(i)| < 1 (3.6) 

'o->0 .s_,t)o |r|<ro 

Condition (C) is a local uniform Cramer condition; it is implied by 

Lemma 3.2 Assume £,ii//iX"^)) has a nontrivial absolute continuous part and that i/r 
is continuous. Then (C) is fulfilled. 

Remark 3.3 (a) By condition (bmi) —as ifr e f— , h = -1. 

(b) Condition (C) is not fulfilled for the median, as its influence curve just takes the values -b, h F-a.e. 
A direct proof for an analogue to Theorem |3.5| is possible, however, and given in Ruckdeschel 1 25 1 . 

(c) For an expansion of the MSB up to o(n"''^), the k part of assumption {3.5) can be dropped, and 
we may use assumptions 

(D') For some r5 e (0, 1], L, V, and p allow the expansions 

L(t) = ht + l2/2tH0(r*'). V(f) = vo(l+Vif) + O(r'+''), p(t) = po + 0{t') (3.7) 
(C) There exist to > 0, .so > such that for all S[ > .vq 

/io./o(-'i) ^"P sup|/,(i)|<l (3.8) 

so<s<si |r|<fo 

Note that (C) implies (C), but contrary to (C), in (C) the case sup^^ fsoJoUl) = 1 for all io > and all 
fo > is allowed. 
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Illustration We specialize the assumptions for F — 7V(0, 1), i.e. Af(x) - x, and 
^(x) - f)c{x) = AcJcmin{l,c/|jc|) from ( 1.6 1 with such that f)c e 'F: 

Proposition 3.4 For F — N(0, 1) and for ip — rjc an influence curve to c & (0, oo) of 
Hampel-form rjc — Ac(xmin{l,c/|x|) with Ac — (20(c) - 1)"', assumptions (bmi) to 
(C) are in force; in particular the bounds in (Lb) and (Vb) are even exponential. 
With <l>(x) the c.d.f. of N(0, 1) and ip{x) its density, we obtain I2 = 0, Vi = 0, po — 0. 
For c e (0, 00), we get 

'- - Vq = 2fc (1 - 0{c)) + A£.(l - 2htp{c)), V2 ^ 



{20(c) -\y " ^ 2c^(i - 0(c)) + 20(c) - I - 2cifi(c) 

3Al (1 - 20(c) + 2cv(c)) ^ 
Pi = 3 + 3vo , Kt) ■■ 



3A^ (1 - 20(c) + 2cy(c)) _ 2c^ (1 - *(c)) - 2c(c^ + 3)ip(c) + 3(2* (c) - 1) 



[2c^ (1 - 0(c)) + 2S>(c) - 1 - 2cAf(c)Y 

For c i 0, h - I, v^ - J, V2 - p\ — 2 — -2, and formally, for c t 

h = 0, vq = 1, V2 = 0, pi = 0, a:o = 0. 

Theorem 3.5 (Main Theorem) In our one-dim. location model assume (bmi) to ( C) 

(a) the maximal MSB of the M-estimator S „ to scores-function i// expands to 

R„(Sn, r, so) = rV + vo^ + ^ Ai + U2 + o(«-i) (3.9) 

with 

Ai = VQ^[±(4vi +3l2)b-\-l) + b^ + [2b^±l2b^]r^ (3.10) 
A2 = vo^ {(I2 + 2 vi )po + f Pi) + vo^ (3v2 + f ll + h +9v] + Uv^h) + 

+[vo^((3v2 + 3vi + f Z2 + 2Z3 + 12vi l2)b- + 1 + (8yi + 6/2)^) + 
±3l2b^ +5b^]r^ -\-[(lll-\- ^^h)b^±3l2b^ + 3b^)r'^ (3.11) 

and we are in the — [+]-case depending on whether p.l2[ ) or p.l3[ ) below applies. 

(b) /ef P^' := (H^lLi ^ti contaminating measures for ( |1.2| l. T/ien 2„ w/f/z 

ai contaminating measures generates maximal risk in p.9[ ) ///or A;i > 1 anc/ k2 > 
2 V (| + ^) w/f/i 6 from (Vb) and Ki{n) - '~k\ryfn^ either 



(f )',') is Ki{n)— concentrated left ofy„ — b ■\Jk2 log(n)/n up to o(n ') (3.12) 



(f *) /s Ki(n)— concentrated right ofy„ + -y/A;2 log(n)/n m/? to o(n ') (3.13) 

More precisely, if sup tff < [>] — infi//, the maximal MSB is achieved by contami- 
nations according to p.l2| l [ ^3.13\ ]. In case supi/' = -inf 1/', p.l2| l [ ^3.13^ ] applies 
if 

VI > [<] - |(|('-2 + 3)(1 + ^ - ^) + 3(1 - |)) (3.14) 

If sup t// — — inf if) and there is "— " in ( |3.14[ ), p.l2[ ) and p.l3[ ) generate the same risk 
up to order o(n"'). 
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Remark 3.6 (a) Curiously, although being of con'esponding order, no po [/col-term shows up in 
the correction term Ai [A2], which is probably due to the special loss function. 

(b) As announced, for r = 0, the approximation is one order faster than under contamination. 

(c) The maximal MSE on (3„ is always underestimated by f-o asymptotics, as maximality always 
forces Ai to be non-negative. 

(d) Let be any distribution in (3„ attaining maximal risk in Theorem 3.5 Under symmetry or more 
specifically if h = vi = po = 0, {i.9) becomes 

n E^o[Sl ] = {r^b^ + vo") (l + ^) + ^ (^^(l + r^)) + OQr') (3.15) 

(e) In the ideal Gaussian location model (i.e. r = 0), plugging in the (limiting) results for c = from 
section[3] the RHS of j3.15) becomes 

^ [1 - ^)] + o(«-') = 1.5708(1 - + o(„-i) (3.16) 



2\ 2 3 

suggesting an overestimation of the risk by the f-o asymptotics. This is to be compared to the result for the 
median for odd sample size from Ruckdeschel | 25 1: 



nEp.[Med},] = ^ (l + ,r'(^-2)) + o(n-') = 1.5708(1 - ^)+o(n-') (3. 



17) 



so f-o asymptotics indeed overestimates the risk. The dilference of is due to the failure of cond. (C). 

(f) Relevance for the fixed neighborhood approach: If you consider the fixed neighborhood ap- 
proach (of radius e) and formally plug in r = e ■\/n into )3.9t , you obtain the following approximation for 
the unstandardized maximal MSE on the thirmed out (fixed-radius) neighborhood: 

MSE(S,„e,eo) = e-fo^ + [2 b'^ ± h ] + s'^ [{^ ij + jh ±3hb^ +3fo-) + 

+ -vl + -[voM±(4vi +3l2)b+ l) + b^ + — \5b^±3l2h^ + 
n n' ^ ' ' « ' 

W((3v2 + 3vi -I- f /2 + 2/3 -I- \2vxh)b^ ■¥ 1 ±(8vi +6/2)^)] +R„ (3.18) 

for some remainder i?„ the order of which however is uncertain; it should be valid for small e, and is at 
least of order 0(1/h^) + O(e^). These terms once more show that for the fixed-neighborhood approach, 
already for moderate sample sizes, bias becomes dominant, i.e.; in our case, we end up with the median as 
optimal procedure. 



3.1 Cross-checks 

3.1.1 Check with results by Fraiman et al. 

In the symmetric case, the first cross check comes with the asymptotic formula for 
variance asVar(^/r) and (maximal) bias B{iJ/) := asBias(t/r) as to be found in Fraiman 
et al. |8l, where we have to identify s - r/ y/n. Here, asBias((/r)/ ■\/n is defined as zero 
/3of/3^ (1-e) / (A/jt/F-Hefe,andasVar((A) Vi/ylforVi = (1-e) / ipl^^^dF +eb^ 
and V2 - (1 - e) J ii/B(i//)dF. Assuming that J ipB(,i,)dF = L'(B(iff)) and using that 
/ i/fBwdF = -BW + o(B2)), / ^l^^^dF = ViB(ifr)f + L(B{il,)f = y2(l + o(B)), 
L'(B{\p)f = -1 -H o(B), we get that 

asBias((/r) = ^[^lbe{l +e + o{e)) = rb{\ + ^ + o{n-^'^)) (3.19) 

asVar((/r) = (1 h- e)v^ + sb + o(e) = + ^(v^ +b) + oirT^I^) (3.20) 

and hence — in accordance with formula p.9| l — 

asMSE(^) = {vl + ,^b^)(l + ^) + -^b^l + r2) + o(n-'/^) (3.21) 
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3.1.2 Check with higher order asymptotics for the median 

The second check comes with the higher asymptotics for the median from Ruckde- 
schel ||25]| . In a first step, we assume that with /o > and some 5 e (0, 1], 

/(f) = /o+/if + 0(f'^*) (3.22) 

As for the median, - sign(x)/(2/o), we have vq - b - Jj- and eo = 1/2- 
For the moment we ignore the fact, that conditions (C)/(C') are not fulfilled. Easy 
calculations give I2 - -/1//0, vi - 0, po = 0, so that with our formula \i.9\ we 
obtain for odd sample size n 

R,Mu^„,r, 5)=-^((l + r^)[l + f„^('-^ + 3)) + o(n-''^) (3.23) 



in complete agreement with Ruckdeschel ||25l . As a next step we compare this to 
t-o asymptotics to be obtained by p.9| l — again ignoring condition (C). We get = 
-/2//0, V2 - -4/g , pi = 4/0, and hence for odd sample size n, after some reordering 

^„(<AMe.„,r, i) I o(i) + ^|(1 + r2) + ^(2(1 + r2) + ^^|) + 



andit is just the framed term |, which is coming in as |pi vo from p.l which causes 
a difference to the result of Ruckdeschel ll25l . where we get the value 1 instead. 
This discrepancy, however, is in fact due to the failure of condition (C), because 
Theorem B.2 which we need to prove p.9|), is not available in this case. 



4 Relations to other approaches 

Of course the idea of assessing the quality / speed of convergence of CLT-type argu- 
ments by means of higher order asymptotics is common in Mathematical Statistics, 
cf. among others Ibragimov and Linnik ||T6| . Bhattacharya and Rao Q, Pfanzagl 
Pol, Hall fTO\, Barndorff-Nielsen and Cox f2\ and Taniguchi and Kakizawa ["30]. 
Asymptotic expansions of the moments of statistical estimators — like MSE in our 
case — have akeady been studied by Gusev Q and PfafF IfTSl ; both approaches, how- 
ever, only consider the ideal model, and work with pointwise expansions of the like- 
lihood. 

Also the idea to improve convergence by means of saddlepoint techniques and con- 
jugate densities, respectively, has been a large success in this context, cf. Daniels |5 1, 
Hampel HIl, Field and Ronchetti |7|. 

Our approach is simpler in the sense that instead of approximating the c.d.f. or the 
density of our procedures on the whole range of arguments, we directly approxi- 
mate our risk. Doing so, we do not run into problems of bad approximations in the 
tails of a distribution, because all that is interesting for our risk will occur within 
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a (decreasing) compact; using saddlepoint techniques, we would have to solve the 
saddlepoint-equation for a grid of evaluation points f, to get an accurate estimate for 
the density which makes the corresponding solution less explicite than ours. 
Even more important, note that a highly accurate approximation of the distribution 
of the M-estimator would not suffice to enforce uniform convergence of the MSE, 
which was the reason for our modification of the neighborhoods p.3| l. Also, contrary 
to "usual" small sample asymptotics, by our approach no particular contamination 
has to be assumed right from the beginning but we rather identify a least favorable 
one within the proof. 

In the setup of saddlepoint-approximations, one would apply Field and Ronchetti Q 
Theorem 4.3] which at least covers the Hampel-type solutions. The pointwise formu- 
lation of assumption A4.2 therein, i.e.; there exists an open subset U c R, such that 
(i) for each e R, F(U - 0) = 1 and (ii) Dtfr, D^ij/, D^i// exist on U, seems problem- 
atic, however, as it allows for pathological i/'-functions defined similar to the Cantor 
distribution function (while F may be something like N{0, 1)), for which the inter- 
change of differentiation and integration becomes awkward. As may be read off from 
( |3.9| l, in the ideal model, as for the saddlepoint approach, we, too, get an expansion 
of order 1/n, a fact, which is not due to symmetry of A and/or ij/l So in fact we get 
the same approximation quality as with the saddlepoint approach — indeed, by the 
Taylor-expansion step in section A. 3 we extract an argument to be expanded from 
the exponential, which also is an idea behind the saddlepoint approximation, cf. Field 
and Ronchetti |7 p. 26]. On the other hand, even in the restricted neighborhoods of 
( |2.3| l, it is not clear to the present author, if in general, the saddlepoint approxima- 
tion holds uniformly in t, so it is not clear, whether an improved approximation for 
the density will result in a better approximation of the risk. A detailed empirical and 
numerical investigation of such questions is contained in Ruckdeschel and Kohl Ii26i . 



5 A simulation study and numerical evaluations 

Before starting with the theoretical findings we summarize the results of a simulation 
study that actually lead us to the closer examination of the higher order expansions 
of the MSE. 



5.1 Simulation design 

Under R 2 . U.Q, cf R Development Core Team \21j, we simulated M = 10000 
runs of sample size n - 5,10, 30, 50, 100 in the ideal location model !P - N{9, 1) at 
= 0. In a contaminated situation, we used observations stemming from 

Q„ = £{[(1 - UiWl + UiXf]i \YjVi< ^nlT" - 1 ) (5.1) 

for JJi Bin(l, r/ V"), ^'t '~ ^(0' 1)' '~ hm all stochastically independent and 
for contamination radii r - 0.1,0.25,0.5, 1.0. 

As estimators we considered the median (with the mid-point variant for even sample 
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size), and M-estimators to Hampel-type ICs rj^ of form ( |1.6| l with clipping heights c = 
0.5, 0.7, 1, 1.5, 2 and co(r), the f-o-o clipping height according to ( |1.6| l. All empirical 
MSB's come with asymptotic 95%-confidence intervals, which are based on the CLT 
for the variables 

empMSE,, = ^ 2:^.[5„(samplep]2 (5 2) 

Note that with respect to p.l2| l/( |3TT3| l, and the considered estimators, a contamina- 
tion point 100 will largely suffice to attain the maximal MSE on Q„. 

5.2 Numerical evaluations 

By means of relations ( |1.9| l we may reduce the problem of finding the exact distri- 
bution of our M-estimators to the calculation of the "exact" distribution of 2, ipiX,). 
For this purpose, we may apply the general convolution algorithm for arbitrarily dis- 
tributed real- valued random variables introduced in Ruckdeschel and Kohl Il27l . This 
algorithm is based on FFT resp. discrete Fourier Transformation (DFT) and is im- 
plemented in R within the package distr available on CRAN, see Ruckdeschel et al. 
Il28l . Ruckdeschel et al. ||291 . 

In Ruckdeschel and Kohl fl^, to increase accuracy for M-estimators to Hampel ICs, 
we extend our algorithm from distr to (a) better cope with mass points in +b and 
(b) to calculate the "exact" finite-sample maximum MSE on Q„. Here we confine 
ourselves to attach extra columns "numeric" to the following tables summarizing our 
simulation, "numeric" will then stand for application of Algorithm C respectively Al- 
gorithm D from Ruckdeschel and Kohl ll26l . 

More specifically, for "exact" terms, as worked out in Algorithm C (ibid.), we have 
to take into account that after conditioning w.rt. the event that the number of con- 
taminations K in the sample is less than half the sample size, the switching variables 
Ui from ( |1.3[ ) no longer are independent. So we may only apply the FFT-based Al- 
gorithm from Ruckdeschel and Kohl |26 | to an absolutely continuous inner part and 
have to calculate the rest by explicitly summing up the events — for details see the 
cited reference and the R-program available on the web-page to this article. 

On the other side. Algorithm D uses the fact that by the exponential negligibility 
shown in subsection |2] the dependency of the £/, may be ignored for n sufficiently 
large — in our case this was possible for n > 30, moderate radius r and robust clip- 
ping height c. Then, we simply may determine the corresponding convolutions of the 
corresponding distributions of the summands directly by Algorithm 4.4 from Ruck- 
deschel and Kohl [27 J. 

To demonstrate the negligibility, for « < 30, we calculate both "exact" terms (Algo- 
rtihm C) and those obtained by superposition of the a.c. part and the random walk, 
ignoring all mass points of the law of the sum (Algortihm D). 

5.3 Results 

A more detailed account of the results of the simulation study in tables may be found 
at the web-page to this article. Here we only present some few results which led to 
the subsequent investigation. 
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1 emp., num., 

ni 

situation 


md as. MSE at r = 0.1, c - 

simulation 
S„ [low; up] 


= 0.7 

numeric 
Algo C Algo D 


asymptotics 


5 

cont 


1.147 [ 1.114 
1.403 [ 1.359 


1.179] 
1.447] 


1.172 1.168 
1.434 1.535 


1.187 1.187 1.169 
1.205 1.342 1.345 


10 

cont 


1.179 [ 1.139 
1.331 [ 1.292 


1.205] 
1.369] 


1.177 1.174 
1.327 1.326 


1.187 1.187 1.178 
1.205 1.302 1.303 


30 

cont 


1.209 [ 1.175 
1.301 [ 1.264 


1.242] 
1.337] 


1.183 1.180 
1.265 1.262 


1.187 1.187 1.184 
1.205 1.261 1.261 


50 

cont 


1.192 [ 1.158 
1.250 [ 1.214 


1.225] 
1.285] 


1.181 
1.247 


1.187 1.187 1.185 
1.205 1.248 1.249 


100 " 

cont 


1.161 [ 1.128 
1.212 [ 1.178 


1.193] 
1.246] 


1.182 
1.232 


1.187 1.187 1.186 
1.205 1.236 1.236 



Table 2 emp., num., and as. MSE at r = 0.5, c = 0.7 



nl 

situation 


simulation 
S„ [low; up] 


numeric 
Algo C Algo D 


asymptotics 


5 

cont 


1.166 [ 1.134 
2.989 [ 2.892 


1.199] 
3.087] 


1.172 1.168 
3.016 12.491 


1.187 1.187 1.169 
1.647 2.529 3.103 


10 

cont 


1.191 [ 1.157 
2.934 [ 2.836 


1.224] 
3.032] 


1.177 1.174 
2.840 4.820 


1.187 1.187 1.178 
1.647 2.271 2.557 


30 " 

cont 


1.194 [ 1.161 
2.183 [ 2.119 


1.227] 
2.247 ] 


1.183 1.180 
2.167 2.167 


1.187 1.187 1.184 
1.647 2.007 2.102 


50 

cont 


1.165 [ 1.133 
1.946 [ 1.893 


1.197] 
1.998] 


1.181 
2.008 


1.187 1.187 1.185 
1.647 1.926 1.983 


100 

cont 


1.192 [ 1.159 
1.894 [ 1.844 


1.226] 
1.944] 


1.182 
1.879 


1.187 1.187 1.186 
1.647 1.844 1.873 



5.3.1 Fixed procedure, fixed radius 

To get an idea of the speed of the convergence of the MSE to its asymptotic values, 
we consider the H07-estimator from Andrews et al. [1], i.e. the M-estimator to ?/o.7 at 
r = 0.1 and at r = 0.5 for different sample sizes n. 

The simulated empirical risk comes with an (empirical) 95% confidence interval and 
is compared to the corresponding numerical approximations and to the f-o, s-o, and 



t-o asymptotics from Theorem 3.5 Corresponding tables for the f-o-o M-estimator 
to rjcf, may be drawn from the web-page to this article. The results are tabulated in 



Tables T]2 In Table [3] we consider the relative MSE, calculated as the quotient 



MSE(c, r)/MSE(co(r), r). This is a natural expression to compare the efficiency of 
different procedures. We compare the empirical terms from the simulation to the cor- 
responding numerical approximations and to the asymptotic terms derived by means 
of Theorem |3.5| We already recognize a very good approximation down to very small 
sample sizes. 

5.3.2 Fixed procedure, fixed sample size 



In order to study the effect of the radius on the quality of the approximation, we 
consider the M-estimator to /70.5 at sample size n = 30 at varying radii. The results are 
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Tables emp., num., and as. relMSE at r = 0.1,0.5, c = 0.7 relative to Var[X„] for id and MSE(co(r)) for 

cont 



nl 

situation 


sim 


)• = 
num 

ex/* 


0.1 

asymptotics 


sim 


r = 
num 

ex/* 


0.5 

asymptotics 


5 

cont 


1.161 
1.003 


1.163 
0.956 


1.173 1.173 
1.143 1.039 


1.038 
0.992 


1.042 
0.978 


1.041 1.041 
1.006 0.989 


10 

cont 


1.167 
1.049 


1.166 
1.029 


1.173 1.173 
1.143 1.065 


1.037 
0.993 


1.041 
0.977 


1.041 1.041 
1.006 0.992 


30 

cont 


1.174 
1.094 


1.170 
1.086 


1.173 1.173 
1.143 1.095 


1.037 
0.994 


1.041 
0.993 


1.041 1.041 
1.006 0.997 


50 

cont 


1.160 
1.096 


1.169' 
1.096* 


1.173 1.173 
1.143 1.105 


1.038 
0.996 


1.041* 
0.995* 


1.041 1.041 
1.006 0.999 


100 

cont 


1.180 
1.122 


1.170* 
1.110* 


1.173 1.173 
1.143 1.116 


1.044 
0.999 


1.041* 
0.999* 


1.041 1.041 
1.006 1.001 



Table 4 emp., num., and as. MSE at « = 30, c = 0.5 



r 




simulation 


numeric 




asymptotics 






Sn 


[low; 


up] 


AlgoC 


AlgoD 


nO 


,,-1/2 




0.00 


1.272 


[ 1.237 


; 1.307] 


1.259 


1.256 


1.263 


1.263 


1.259 


0.10 


1.374 


[ 1.336 


;1.413] 


1.337 


1.335 


1.280 


1.334 


1.334 


0.25 


1.545 


[ 1.502 


;1.588] 


1.545 


1.542 


1.588 


1.514 


1.532 


0.50 


2.204 [ 2.139 


;2.268] 


2.189 


2.187 


1.689 


2.037 


2.128 


1.00 


5.362 [ 5.219 


;5.505] 


5.238 


5.265 


2.967 


4.132 


4.652 



tabulated in Table [4] The simulations and the numeric values clearly show that with 
increasing radius, the approximation quality of f-o asymptotics decreases, which is 
conformal to the infinitesimal character of our neighborhoods. A corresponding table 
for the more liberal M-estimator to 772 at sample size n = 50 may be drawn from the 
web-page. 



5.3.3 Fixed radius, fixed sample size 

In this paragraph we want to compare M-estimators to different clipping heights and 
see whether the choice of co may also be considered reasonable for moderate n. To 
this end, we consider the situation r - 0.25 and n - 30. The results are tabulated in 
Tables|5]and|6] The simulations already indicate that the answer should be affirmative. 
The numeric and asymptotic values for the median are taken from Ruckdeschel |25|. 
Corresponding tables to the situation r - 0.5 and « = 100 are on the web-page. 



5.3.4 Relative error compared to numerically exact risk 

A closer look onto the relative error of our higher order asymptotics w.r.t. the numer- 
ically exact risk MSE„ is provided by figure [T] A zoom-in for n > 16 is available on 
the web-page. Indeed for all investigated radii r - 0.00, 0.10, 0.25, 1.00, the relative 
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Table 5 emp., num., and as. MSE at n = 30, r = 0.25 



estimator/ 

situation 


simulation 
S„ [low; up] 


num 

ex 


asymptotics 

n" jr'/2 


Med 

COllt 


1.492 [ 1.451 
1.786 [ 1.736 


1.532] 
1.835] 


1.501 
1.779 


1.571 1.571 1.496 
1.669 1.821 1.767 


c = 0.5 "* 

COllt 


1.250 [ 1.216 
1.545 [ 1.502 


1.284] 
1.588] 


1.259 
1.545 


1.263 1.263 1.259 
1.369 1.514 1.532 


c = 1.0 

cont 


1.092 [ 1.062 
1.433 [ 1.393 


1.122] 
1.473] 


1.105 
1.440 


1.107 1.107 1.105 
1.241 1.402 1.425 


c = 2.0 

cont 


0.991 [ 0.963 
1.611 [ 1.566 


1.018] 
1.656] 


1.010 
1.633 


1.010 1.010 1.010 
1.285 1.556 1.604 


c = CO = 1.3393 

cont 


1.035 [ 1.006 
1.438 [ 1.398 


1.063] 
1.479] 


1.051 
1.452 


1.139 1.053 1.052 
1.220 1.405 1.434 



Table 6 emp., num., and as. relMSE at « = 30, r = 0.25 relative to Var[X„] for id and MSE(co(r)) for cont, 
co(r)= 1.3393 



estimator/ 

situation 


simulation 


numeric 
ex 


asymptotics 

„0 „-l/2 


Med 

cont 


1.435 
1.241 


1.427 
1.224 


1.379 1.379 
1.320 1.263 


c = 0.5 

cont 


1.202 
1.073 


1.197 
1.064 


1.199 1.198 
1.077 1.068 


c = 1.0 

cont 


1.051 
0.995 


1.051 
0.991 


1.051 1.051 
0.998 0.994 


c = 2.0 " 

cont 


0.953 
1.119 


0.960 
1.125 


0.959 0.960 
1.107 1.119 



error of our asymptotic formula w.rt. the corresponding numeric figures is quickly 
decreasing in absolute value in n; also, we notice that we have a certain oscillation 
between odd and even sample sizes for very small n which is explained by the fact 
that for even n there may be ties. By Lemma [l~T| the contribution of these ties to the 
risk is however decaying exponentially in n. 

In table |7] we have determined the smallest sample size no such that for n > hq 
the relative error using first to third order asymptotics for approximating MSEnC^frc) 
to c = 0.7 is smaller than 1% resp. 5% which shows that for r < 0.5 we need no 
more than 25 (60) observations to stay within an error corridor of 5% (1%) in t-o 
asymptotics. For f-o asymptotics, however we need considerable sample sizes for 
reasonable approximations unless the radius is rather small. 

The figures in this table are to be taken "cum grano salis" due to numerical inaccura- 
cies in MSE„ w.rt. the exact risk of order 1e- 5 which may result in a deviation from 
the "real" «o of +2 for no < 200. 
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maximal MSE for M-estimator to \|fc with c=0.7 

r=0.00 r=0.10 




20 40 60 80 100 20 40 60 80 100 



r=0.25 r=1 .00 




20 40 60 80 100 20 40 60 80 100 



Fig. 1 The mapping n ^ vt\.^rvov(MSEn{if/c)) for c = 0.7 and F = N{Q, 1). 



Table 7 Minimal no such that for n > no the relative error using first to third order asymptotics for approx- 
imating MSEnC^^Tc) for c = 0.7 is smaller than 1% resp. 5% 



rel.err 


order 


r = 0.00 


r = 0.10 


r = 0.25 


r = 0.50 


r= 1.00 


1% 


1st order asy. 


9 


>640' 


> 3927' 


> 14425' 


> 49220' 




2nd order asy. 


9 


15 


60 


196 


>580* 




3rd order asy. 


5 


15 


30 


59 


146 


5% 


1st order asy. 


3 


28 


162 


> 590* 


> 1995* 




2nd order asy. 


3 


6 


17 


43 


119 




3rd order asy. 


3 


6 


12 


23 


49 



*: for n > 200 computation of MSE„ gets too expensive in time; instead we use the the corresponding t-o 
figure. Assuming an error of t-o asymptotics of order 0(n"''-), a corresponding regression onto the error 
term gives estimates for the regression coefBcient to the term n"^'^ of about -50, -166, -534, and -1940 
for r = 0. 1, 0.25, 0.5, and 1.0, so that the error (read from top to bottom and then left to right) incurred by 
this replacement is about -3e - 3, -7e - 4, -3e - 4, -2e - 2, -2e - 2, -1.3e - 1, and -2e - 4. 
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6 Ramifications 

6. 1 Ideal distributions with polynomially decaying tails 

In order to be able to cover ideal distributions with polynomially decaying tails, we 
sharpen the restriction of the original neighborhood system Q„(r, eq) from (|2.3|l to 



Q„ = £[[(! - UdXf + t/,Xf ],■ I hm sup i Ui < } (6-1) 

n 

for some fixed such that 

< 4 < eo (6.2) 
giving the new neighborhood system (3^(r; Sq). Correspondingly, we will consider the 
asymptotic s of 

<(5„,r;4):= sup n f \Sn-ea\^dQ, (6.3) 

It is not surprising that all results up to this point on maximal risks are unaffected by 
this subtle modification. But, we may replace assumption (Vb) by 

(Pd) There are some T > and 77 > such that 

F{t) > 1 - r'', for t > T, F(t) < i-ty'^ for t < -T (6.4) 



Proposition 6.1 In the location model of Subsection 1.1 assume (bmi), (D), and ( C) 
from section [ij additionally assume that the cent ral distribution F satisfies \6A) 



Then, on Q'„{r; e'^), the assertions of Theorem 3.5 — with any k2 > 2 — continue to 
hold. 

Property (|6.4[) can be made plausible by the following proposition: 



Proposition 6.2 In the location model of Subsection 1.1 assume: For any d > 0, 

liminf f''(l - F(t)) > or lim inf f''f (-f) > (6.5) 

Then for any sample size n, the MSB of the M-estimator S n to any IC ip according to 
(bmi) in the ideal model is infinite. 

Conditions p.l2| l resp. p.l3| l almost characterize the risk-maximizing contamina- 
tions: 



Proposition 6.3 Under the assumptions of Theorem 3.5 let 60, cq > 0. Assume that 
b — b and let B„ :— infjx | t//(x) > b — cq/ yfn}. Assume that, for K — Yj"i=i Ui and 
k> {\ - 6)rs/n, 

n 

Pr ( 2 Ui I(Xf < Bn + vo Vlog(n)/«) >l\K = k)>pQ>Q (6.6) 

1=1 

Then, eventually in n, for any such sequence of contaminations QJ", G <3('"), the max- 
imal MSB as in condition p.l3| l (i.e. with positive bias) in \3.9) cannot be attained. 
More precisely, 

RniSn, r)-nEQ>,Sl> 2pm){rcQ + b)/(n V2^) (6.7) 
A corresponding relation holds for condition ( |3.12| l. 
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6.2 Convergence of variance and bias separately 



The technique used to derive Theorem 3.5 also applies if we are interested in variance 
and bias separately; we get 

Proposition 6.4 Under Assumptions (bmi) to ( C) and for sample size n, an M-estimator 
S„ for scores -function i/r under a measure e (3„(r; eq) according to p.l2| l resp. 
P.13[ l admits the following expansions 

V^|Bias(5„, e;!) I = I r/7 + ^ Bi,,, + ^ + ^ B2 | + o(«-') (6.8) 

«Bias2(5„, e°) = r2fo2 + ^ + i C2 + o(«-i) (6.9) 

« Var(5„, e°) = + ^ Di + i D2 + o(«-i) (6.10) 

Bi.o = (ifc+vi)v2, Bi.i = fe(l±j/2fo) (6.11) 
B2 = [(^/i + ih)b^ +b± kh'-y + b(l ± \hb) + 

+[(i;3 + I /2 + V2 + v2 + 3 i-'i h)b ±{h± vijv'o^ (6.12) 

Ci = b^p-(±hb + 2) ± i)(;2 + 2vi)vg (6.13) 

C2 = (VI h+ll\+ V?)vo'* + [3 fo2 ± 3 /2 + (I /2 + 1 /3)fo4j/ + 

+[(J /2 + ^3 +2v2 +2v2 + 7i'i l2)b^vo^ ±{2l2 + 4vi)bvl + 2b- ±l2b^r'^ (6.14) 
Z)i = [±2(/2 + vi)Z;+ ijvo^ (6.15) 
= (/a + 5/2 + llvi/2 + 8v2 + 3v2)vo'' + (|pi + (/2 + 2vi)po)vo^ + 

[[ih + v'l + V2 + 5 VI k +4ll)b^ ± Aih + vi)b+ l)vo^ ± 2/2 fc' + 3Z)^]r^ (6.16) 

where we are in the — [+'\-case according to whether p.l2| l or p.l3| l applies. 
For a proof to this proposition, we may proceed exactly as in the proof of Theo- 
rem [33} only in ( |A.38| l, we keep the integration domain and replace the integrand 
ui{sY ip(s)g„is) by ui(s) ip(s) gnis); we do not spell this out here. In MAPLE the ex- 
pressions are obtained by means of our procedure asESi. 

A Proofs 

A.l Proof to Lemma [J!2l 



Let G, be the law of ifi,(X"'). By assumption, the Lebesgue decomposition yields dGo = ag dA + {\ - a) dG 
for a E (0, 1], g some probability density and G ± A. The support of g contains an open interval (ci,C2) 
and Go(c2) > Go{ci). On (ci , c 2)1 is strictly isotone and continuous, so that with f/, = 1^"' (c,) 

fdl+t 

P{if/,{X''')e{ci,C2)) = P(d,+t<X"' <t + d2)= I dF (A.l) 

But 



Jrfi- 



dF = Go(c2)-Go(ci) + o(P) (A.2) 

so that for t small enough, the absolute continuous part of G, is uniformly bounded away from and hence 
by the Lebesgue Lemma our condition )3.6) holds. □ 
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A. 2 Proof to Proposition [T4] 



To get E[/)cA/-] = 1, the Lagrange multiplier must be determined by A^' = 20(c) - 1. It holds that 
b = AcC. For c — » oo we obtain the classically optimal IC, and c — > 0, using I'Hospital yields the IC of the 
sample median. As to L(t), we obtain 

L,(f) = A[c-{c+t)<P(t+c)-Kt-c)<Pit~c)+v(t-c)-v(t+c)l LUt) = -t, Lo(t) = ^J\-20(t)) (A.3) 

all arbitrarily often differentiable functions, so the /,-part of (D) holds as stated in the proposition. For V(t) 
introduce 

S(f) :=E[^(x-f)-], W(t):=V(tf 
Then, suppressing the argument t,W = S - L^, W' = S' - ILL', W" = S" - 2L'^ - 2LL"and 
with Wo = W(0), Wi(0) = W'(0)/Wo, W2(0) = W"(0)/Wo, we get 

Wo = S(0), W, = 5'(0)/5(0), W2 = (S"(0) - 2)/5(0) 
and hence V(,t) = ^JWo(^ + ^ + + 0(f^^) so that 



/S(0), 



5'(0) 



25"(0)-4-S'(0)V5(0) 



2S(0) 4S(0) 
In our case we have for < c < oo 

5(0 = A^[c-(1 - <P(t + c) + <P(t - c)) + (1 +l^)(<Pit + c)-0{t-c)) + (t-c)ipit + c)-(t + c)ip(t-c)] 



and S{t) = 1 + r for c - 



for c = 0, so i3A) holds with 





< c < oo 


c = 


C = CO 


5(0) 


2hHl - 0{c))+Ac.{l - 2hif{c)) 


1 


n 

2 


S'(0) 











5"(0) 


2Al(20(c) - 1 - 2cif(c)) 


2 






and the assertions as to vq, vi, V2 follow. As to (Vb), for |/| — ♦ oo, we get with Mill's ratio for any r5 > 

I I I - - I 

h - |L(0| = A J (t + t)0(t + c) - (« - c)0{t - c) + vCf - c) - p(f + c) = = o(exp(- - — - )) 
III 12+0 

Again with Mill's ratio, |5(0 - h'-\ < A^2{t^ + 1)*(|(| - c) + 2(|f| + c)(p(\t\ - c) ] = o(exp(- j^)) and hence 
V^it) = 5(0 - L{tf = o(exp(- J^)). For c = we get | - \L(t)\ | = V27r S(0 = o(exp(-fV2)) and 

V^(t) = h^ -{b + o(exp(-/V2))f = o(exp(-r/2)) 

For p(0 and K{t), we introduce M(t) := E[i//(X - 0^], A'(0 ■= E[iA(^ ~ ')'']■ Then, again suppressing the 
argument f 

p = V'^[M - 3L5 + 2L^], K = V^IN - 4ML + tSL^ - SL*] - 3 
and hence po = Vp'M(O), atq = V^'^N(0) - 3. For p\ we note 

p = y"^^(- 3[M- 3L5 +2L^]V' IV + (M' -iL'S - 3LS' + 3L' L^)) 

so that pi = Vq3(-3M(0)vi + M'(0) + 35(0)). In our case, for c = oo, M(t) = -3t - P, M'(t) = -3 - 3?^, 

N{t) = /** + dt- + 3and for c = 0, M{t) = ( .^)'(1 - 2*(0), M'(0 = -2(^ f(p{t), N(t) = ^, while for 
< c < oo 

M(0 = A^[c^ - 0(t + c)(c^ + + 30 - *(? - c)(c^ -r" - 30 + 

+(t^ + fc + 2 + c^)¥'(r - c) - (r^ -tc + + 2)¥'(f + c)] 
M'(0 = A^[3(<f(i-c)- <P(r + c))(r^ + 1)- 3(f-f);o(f + c) + 3(? + f)</>(f-c)] 
^(0 = A^lt'* + (*(f + c) - <f (? - c))(?'* + + 3 - t'*) + {t' - t-c + rc^ -r' +5t- 3c)(p{t + c) - 
-(r^ + rc + fc^ + + 5f + 3c)(/'(< - c)] 
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This gives the assertion as to po, pi and kq, and hence \3.5) holds. 

For c > 0, Pr(|;7£.| < b) > and r/^ is continuous. But, on {\t]c\ < b], £.{Tjc) is a.c. and hence Lemma [X2] 
entails (C). □ 



A. 3 Proof ofTheorem|33] 

We plug in (X,) ~ Q„ for some Q„ e Q„(r) into the defining relations for M-estimators of (L8j. 



Outline of the proof We begin with conditioning w.r.t. the number K = 2i Vj = k oi contaminated 
observations; next for fixed ? e R, we consider f„k,t{f) = 2i:(/i=i ^(^i ~ and condition the probability 
w.r.t. its realization r„,t,f- In the sequel we suppress the indices of t„i^ ,. Denote this event by 

Dtj:={K = k,T„_t(^t) = t] (A.4) 

Thus 

nMSE(5',„e„|Dt,-)= f Pr(S^ > f | ) df = f Pr(S„ > V? I A- + f Pr(5 „<- V? I £>*,?) rf' 
Jo Jo Jo 

(A.5) 

For the sequel, we define h := n - k, .s„_j. := i„jt(<) = To derive the result, we then partition the 

integrand according to the following tableau where C > is some constant and S is the exponent from 
assumption (Vb): 





K < kir^ 


k\r^ < K < £on 


K > Eon 


|(| < k2bHog{n)/n 


(I) 


(II) 




k2bHog{n)/n < \t\ < C«'+'''' 


(III) 


excluded 


\t\ > Cn'^^'" 


(IV) 





At this point we also summarize the constants that will be used throughout this section. 



constant 




ki 


value 


> 1 


>2V(| + 5|) 



For all cases except for (I), we will show that they contribute only terms of order o(n" ) to «MSE(S„) 
and hence can be neglected. Applying Taylor expansions at large, we derive an expression in which it 
becomes clear, that independently from t and eventually in n, the maximal MSB is attained for f„ j. either 
kh or identically -kb for all t in (1) — or equivalently all contaminated observations are either smaller than 
S'n ~ k2b^ log(n)/« or larger than j'n + k2b^ log(n)/n. Integrating out first / and then k we obtain the result 
{3.9) stated in Theorem |3.5| 



Conditioning w.r.t. the number of contaminated obser\>ations As announced, for the moment we condition 
w.r.t. the number K = Yji U i = k oi contaminated observations in the sample. Denote the ideally distributed 
part as r„.t(f) := Y.i:Ui=o MXi). Then we get 

I rm - T„k{t)-nL{t) f^M-nUt) 
Pr{S„ <t\K = k\+ Rl^\k) = Pr(r„,t(/) < ~r,,t(f)) = < - ) (A.6) 

where Rf \k) t can only happen for mass points of £.(T„ i,(t) + T„ji(t)). 



Conditioning w.r.t. the actual contamination Next, we condition the probability w.r.t. the actual value of 
the contamination f,, t = t. This gives 

Pr|S„ < t\Di,r]+R^°\k,t) = Pi.( ^"-^W""^W < (i)) (A.7) 
where again can only happen for mass points of £.{T„ji(t)). 
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Negligibility of case (IV) Without loss, assume that b = b. By monotoiiicity and boundedness in assump- 
tion (bmi), to given <r] < -b there is a > such that for / > tg, 

b < Lit) = E[if)(X"' -t)]<b + r] 

Lett I > /o, (5 > and C > so that for r > fj, by (Vb), \ V(t)\ < C'r'""*. Then we apply the Chebyshev 
inequality to obtain for ? > f j 

rl , r r - „ , Chcb. nV^(^^t) (Vb) «C'r<'+*' 

Pr{5„> V?pt.,-)<Pr(r„,t(V?)-«L(Vf)>-f-nL(V?)) - - 



{t + 1x1(^1))^ (f + «L( ))^ 



(f + nb + [/t/j + nb + rj\- [k(b ~ b) + nb + rj]- (b- rj)^ 

and coiTespondingly (with b = -b) for Pr{5„ < - | Dj^j]; but 



(A.8) 



, r df = ^-^^^ =o(n-') (A.9) 

Negligibility of case (II) 
Lemma A.l Let 

K := kilogki + I - k[ (A.IO) 

Tlwn it holds that 

Pr(Bin(«, r/ V"^) > kir^) < exp( - a: f V" + o( V" )) (A. 11) 

Proof Ruckdeschel |25 Lem. A.2] □ 

As in (II), |f| < Oi'+'''', the integrand of « MSE(5 „, g/i I D^ j) is bounded by some polynomial in n, 
and hence by Lemma [A. 1 [ the contribution of (II) is indeed o(«"'). 

Another consequence of the exponential decay of jA.l 1) is that we may neglect values of A" > 
k\ (n)r when integrating along K. 

Corollary A.2 Let K ~ Bin(n, r/ Vn ). Then, in the setup ofLemma \A.l\ for any j e N, 
for any < d < '\fii. 

Proof E[^:^I|^^,_,„),.^|] < «^Pr(X > k,{n)r^/Ji) ^ o(e-™"). □ 

Negligibility of case (III) We apply Hoelfding's first bound from Lemma [B?T] 

Pr{S„ > -it^Dtj] < Pr(7-,,i( Vf) > -t\ Di,f) < exp(-2nA^ / b^) (A.13) 
forzl := -L(yft) - As is isotone, L is antitone, hence in case (III), 

L(-{t)< L(b log(n)/n ) = -fc Vfe log(n)/n + o( ^j\og(n)ln ) (A.14) 

Thus 

r kb IXTIl b , — I 

A>-L(^ft)~ — ^—[^^k2 log(«) + o( Vlog(n) )] (A.15) 
n ^jn 

and exp(-22^) < n"2*2(l + oin")). This latter is o(fi"^"^''') and thus integrating n MSE out along (III) we 
get something of order o(;i"'). 
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Asymptotic normality On (I), by Lemma [I~7] 

I \ ^/nV(^/t) J 

for some y > 0, uniformly in f and k. For / = 1, . . . , «, let e { 1, . . . , «) be the indices such that Uj. = 0. 
We may apply Theorem |B.2[ b) to l |A.5) / (A^ , identifying 

:^^[,p,(Xj_)~Lit)l i=\,...,n (A.17) 
and setting := 0,, = {|f| < k^b- log((i)/«). This application is possible, as \iff\ < b, so supj^g^ E|f,jp < oo. 



By condtion (C) of our assumptions, Cramer condition of the theorem holds if n is large enough. 
We note that if in Theorem |3.5| we limit ourselves to term Ai and hence only assume (C), we may apply 
Theorem |B.2[ a). 

With G„j(s) from jB.4) we define G„j(u) := G„j(s„ i^(u)), G„(t) := G„j(t) and obtain for \t\ < k2b^ log{n)/n 
and K <kir V« uniformly in t and k: 

Tl 

0(exp(-r«)) + Pr IS,, > V? I D,,, ) = Pr ( J] f,^ > .v„,t( V?)) = 1 - G„( V?) + Oin^'^) (A. 18) 

1=1 

Hence, using negligibility of (11), (111) and (IV), and setting 

= yfn/n, l„ = ^ki log(n), /i"' = k2h'^ log(n)/f! (A. 19) 

we obtain 

,,(0) 



,„a,|Otf) = (nV^n l-G„(V?) + G„(-V?)dr + o(n-'): 
Jo 



nMSE(5' 

Jo 

= 2in^)-'- uU-Gn(-^) + G„{--^))du + o(n-^) (A.20) 
Jo yn yn 

As G„ is arbitrarily smooth, integration by parts is available and gives 

Xbl„ 2 
^G'„{^)du + o{n-^) (A.21) 
w„ Vn Vn 

with 



R„ := k2 login)b^ [1 - G„(b V™ ) " G„(-b /-^)] (A.22) 
A closer look at s„ ii(±b -yj^^^^^ ) reveals 

Sn.k(±b V^¥^ ) ^ F „ = (l+o(n»)) (A.23) 

^ V^(vo + o(«'')) I'o 

We also note that, again by (bmi) = E[i^-] < h^, hence bjvo > 1. In particular, eventually in n, 

|s„,t(±foV^2log(n))| > V21og(n) (A.24) 

But, as < b hy (bmi), \k\ < b'* and |p| < fo^, and thus by Mill's ratio, there is some < A' < oo, 
independent of t, n, such that for any .s > 

max (1 - G„.,(.v), G„,,(-i)) < K\sf exp(-.r/2) (A.25) 

Thus for n sufficiently large 

, /*,log(„) , , hb^ log(n) , , o^^^ r^/ log(n)^''2 , . o^x 

1 - G„(fe -y/ — ) = exp( — + o(n"))) = 0( ) (A.26) 
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for some 6 > Q. The same goes for G„(-2h ), and therefore, R„ = 0(log(n)''^/;i'+'*) = o(n ') and 

Xbiji 2 
g;,(-^)J« + o(«-') (A.27) 



To make more transparent, which terms are bounded to which degree, we introduce the following notation, 
which will also help MAPLE to ignore irrelevant terms := -^,J„,*(x) = «„,t(-^ ). Then on (I), u = 

0( iJ\og{n) ), = 0{n^). In particular this will not affect the remainder terms of the Taylor expansions of 
assumption (D). 

In the sequel, we drop the indices of s„ ii and .v„ t, where they are clear from the context. Next, we spell out 
G'„iu) in jA.27) more explicitly. Denote 

g„{s, t) := G„,,(i), G;i'(*) := [;|^„](i, t), Gg(.0 := [iQn\(s, t) (A.28) 

Then, as s'(x) = s'(^ )l yfh. 



) = [G<;i(.v(;t)).v'(x) + gS(.vW)] |^^^ = G<'^^^ ^(.v(«)) .?'(«) + G<2_^^ =: §„(«) -Jh 
and therefore 



u^gn(u)du + o{n-^) (A.29) 

Expanding g„{u) Considering g„(i() more closely, we expand the terms according to assumption (D) — 
with the help of our MAPLE procedures asS, asSl, asg 

^ ' V( — ) ''0 1 ' V*i 2 " 

+ - + (" - - ^2/2))] + 0(n-<l+«) (A.30) 

L'(-!^=) (f1 +L(^))V'(^) 

^ ^ y(-^) v^i—) v^^ v^^ 

+ - T - l^'z + ivi'2)w^ + «f''(v2 - 2v2))] + 0(,r<'+'") (A.31) 



as well as 



G 



(1) 



Tk;,pl(s^ - 15.?'* + 45*2 „ ig-)j o(«-('+'") (A.32) 
'-^a - J^) + 0(«"<"2+*'). This gives 
g„(«) = voy(J)[l + -jiPduJ^) + iP2("j'')] + 0(n-('+*>) (A.33) 



and respectively, g'^' ^(.v) = m(.v)/L(i _ j2) + 0(«"<"2+'*'). This gr 



for 

Pliu,t**) = -l2U-2viU + h'i + |2.(„_fll) (A.34) 

and Piiu, a corresponding polynomial in u, ^, v\ ,V2,l2, h,Po, Pi , and '^0. the exact expression of which 
may be taken from our MAPLE procedure asg. 

To be able to calculate the integrals, we expand <f{s) in a Taylor expansion about .vi = (u - fi)/vo as 

>f(-s) = v(si)U - .vi) + - 1)(J- s,)2/2] + 0(«-<'+'>') (A.35) 

and hence g„(H) = iw(ii)g„(si) + 0(n-(l+''>) with g„(ii) := I + ^PiCii,?") + iPjCii,/") for 

^■^ — 3 V 2 
PiUij'^) = Po '' + (| + vi)ii - (/2 + 2vi).vivo + (h + vt)[s] - + 2^ (A.36) 
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and P2{si,fi) a corresponding polynomial again to be looked up from our MAPLE procedure asgns. This 
gives 

h„{sMs)A{ds) + o(n-^) (A.37) 

for 

h„(s) = ui(sfg„{s), ui(s) = svo + /" (A.38) 

Selection of the least favorable contamination Function /i„ (s) from jA.38) is a polynomial in s, hence 
on (I), where |.v| = 0(log(H)), we may ignore terms of (pointwise-in-i) order 0(n"''**'). This gives a 
complicated expression of form 

K{s) = (.wo + t^f + ^ Qi + i Q, (A.39) 

where I'oQi is a polynomial in .v, t^, vq, h, v\, and po with deg(2i, ■«) = 5 and deg(Qi, = 4, and v^Qi 
is a polynomial in s, t^, vq, h, vi, po, h, V2> Pi, and kq with deg(22. ■«) = 8 and deg(2i, f) = 6; the exact 
expressions are available on the web-page and may be generated by our MAPLE-procedure ashn. Denoting 
the second partial derivative w.r.t. f'' by an index t,t we consider h„,j{s) = 2 + -^Qijj + j^Qi.u where 
deg(2i s) = 3 and deg(Q2,r,f> = 6. and under symmetry, more specifically 

h = VI = PO = (A.40) 

Ql.f.r = and deg(Q2,f.r, = 4. That is, on (I), uniformly in s, h„ ,,(s) = 2 + 0(log(fi)^ / V^), and under 
)A.4Q) , the remainder is even 0(log(H)''/;i). Hence eventually in n, uniformly in .v, h„ is strictly convex in 
hence takes its maximum on the boundary, that is for \t^\ maximal. 

Going back to the definition of t^, we note that for fixed n and k, t^ = t/ ijh = Yii:Ui=l 't'i^i ~ f)/ V^- 
Obviously, t is bounded in absolute value by kh. This value may be attained if (up to 0(?r')) all terms 
i//(Xi - t) are either b or -b for all ; in (I). This amounts to concentrating essentially all the contamination 
either right of >•„ + b log(«)/n or left of y„ - b ^ki log(n)/« ; the decision which of the two alternatives 
is least favorable is deferred to subsubsection lA.3l 

As we may allow for deviations from this "outlyingness" as long as we do no affect the expansion 
of the MSE up to 0(«"'), we may weaken the concentration property to j3.12) resp. j3T3j: On (I), |;''| is 
bounded, so smallness of the probabilities in j3.12) resp. j3.13) entails that also the expectations of {t^)' , 
j = 1, . . . , 6 arising in h„{s) are o(n"'). 

Denote a distribution in Q„ which is contaminated according to j3.12) resp. j3.13) by gj). By the 
previous considerations, under Qj], we may consider |fl as being exactly kb, and we will consider the cases 
t = ±kb simultaneously. For the substitution t'^ = ±kb/ V^, the following abbreviations are convenient 

k:=kl^fn, k^ ■.= k/^ = k/n^ (A.41) 

Taking up the dependency on in ft„(.v) as h„is) = h„{s, t^), in the MAPLE procedure ash, we introduce 

h,,{s) = h„U, k^) = h„(s, k% (A.42) 



Integration w.r.t. s In this step we integrate out s in h„(s). As bl„/vo > yjl log(«), by Lemma 
drop the integration limits and get 



B.4 



we may 



nMSE{S,„Q°\K = k) = {n^)-^ ^ h„{s)<pU) A{d.i) + o{n-^) (A.43) 

So for integration, we use that for X ~ A^(0, 1), E[Z^] = 0, for j = 1, 3, 5, 7, and 

E[X^] = 1, E[Z'*] = 3, E[X^] = 15, E[X**] = 115 (A.44) 
and get (by our MAPLE procedures intesout and asMSEK) 

nMSE(S,„ Q° \K = k) = o(«"') + (n^)"2[(/t'')2// + + ^[±(3/2 + 4vi)vlk^b ± kik^fb^] + 

+ f,l(Vl + 'ihXk^^'b^ + (3V2 + 2/3 + 31-2 + 15 ^ nhh)vl{kVb- + 
+(po(2vi + h) + ipi)v3) + (i2vi/2 +I3 + 3V2 + ^ll+ 9v])v*]] (A.45) 
As mentioned in Remark[3!6jc), the terms of kq cancel out for A2 as do the terms of po for Ai . 
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Collection of tenns As we want to calculate the expectation with respect to K, we have to expand terms 
in a way that k is only appearing in integer powers and in the nominator. For this purpose we employ our 
MAPLE procedures asNn, asKn, and get 

in^)-^ = 1 + ^ + |- + o(«-'), = 1 + ^ + o(n""2), (n^T'^ = 1 + o(«") (A.46) 

= S + iL + oOri/2), (yt1)2 = p + 11 + + o(n-i), = p + 3| + o(n-'/2), (/tl)'' = P + oCn") 

(A.47) 

Substituting and n'' by means of these expressions, we obtain (MAPLE procedure asMSEk) 

MSE(5„, Ql\K = k) = o(n-'n . r,^ . [± (4r, . 3 . 1]W ± /./»P^^ ^ 

-yT! 

(3 fo2 ± 3 h IP + (I /2 + i /j) fo'') P + (3v2 + 9.-2 + ^ll + h + l2l2 Vi) vo" + ((/2 + 2vi)po + § pi) I'o^ 
((3 v'T + 3 V2 + 12 /2V1 + -if + 2 /i) fo2 1 + (6 + 8 vi ) Pvj. 

U — ! 12! : >_ ^ (A.48) 

n 

Integration w.r.t. k As by Corollary |A.2| the event [K > (l+S )r^fn] only attributes o(n"') to the expectation 
of E[KJ], 7 = 0, . . . , 4, we can now simply use Lemma |A.l| to determine the MSB. This gives the result by 
our MAPLE procedures intekout, asMSE: 

n Ego [Sl] = r^h'- + vq^ + a 1 + i A2 + o(»- ' ) (A.49) 

with 

Ai = vo-(±(4vi +3l2)h+\) + b^ + [2b^ ± hb^lr^ (A.50) 
A2 = vo^ {(h +2vi )po+^pi) + voU3v2 + ^ll + h +9 v^^ + 12 VI l2 ) + 

+f vo^ ((3 V2 + 3 + ;2 + 2 /3 + 12 i-'i h + I ± (8 vi + 6 12) b) ± 3 hb^ + 5 b"^ ] r- + 

+ l^h)b*±3l2b^ + 3h^)r'* (A.51) 



Decision upon the alternative l |3.12) or )3.13t Denote Q,7 a contaminated member in (3„(r) according to 
j3.12( and correspondingly according to j3.13) . With respect to terms of jA.49( - fA.51) , obviously, 
if supi^ < -inf i/f, the maximal MSE is achieved by 2,7, respectively by if supi^ > -initp- In case 
sup tjj = - inf ijj, the terms in Ai are decisive: 

n(J^Qt^Sl^- ^Q-{Sl ]) = ^Xmr^b' + 3v2)(l + 2^) + 3*!iil±i)] + 4v2(l + ^)v,) + o(n-') (A.52) 

Hence, [Qj] is least favorable up to o(«"') if 

i-'i > [<] - + 3)(1 + 4 - + 3(1 - 4)) (A.53) 



If there is "=" in jA33j, no decision can be taken up to order o(n ). 



A.4 Proofs to Propositions 6.1 and 6.2 



Forei e (0, 1), letA'+(0 = N^(t;n,Ei,b), NM) = N4t;n,£i,h) be defined as 

^+(0 := #[ifr{.Xi - f) > h{l - El), Uj = 0), N4t) := #{iA(a:,- - t) < b(\ - £1), {/,■ = O) (A.54) 
The idea behind Propositions |6.1| and |6.2| is to use the inclusions 

I 2 '/'(X, - r) < 0) c {NM < n,\, { 2 ^(x, - > 0) c |iV_(0 < n_) (A.55) 
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for some numbers n_,n+ yet to be specified. 

For Proposition |6.1| symbolically in the tableau of page |19| we plug in 5 = 0. so that the second and third 
line are separated by \t\ = Cn. All cases except for case (IV) remain unchanged. For (IV), we consider the 
first inclusion of jA.55) . In this case, (2 lAC^i - t) < 0] is distorted most importantly by f = kb. On the 
other hand the N" = n - - K remaining observations cannot be smaller than N"b, so 

2 ^(Xi - < =^ NM^ -Ei) + Kh + N"h < (A.56) 
that is <(-«/) - K(b - bfjj{b{\ - £i ) - fc), and as this has to hold for all K < s'^n, < «( - - E'g(b - 
b))j{b{^ - ei) - fo) =: fi+ = n+Ceg), where by \6.2) and as < ei < 1, we get n+ = ne+ for 

<£+ = (- - SQ{h - b))j[b{l -E,)~b)< l~£g (A.57) 

Accordingly, for the second inclusion in jA.55) , we obtain 

/V_ < „e_ =: = n_(£^) for e_ := [h - s'^ih - b))j{b - bil - si)) (A.58) 

where again < e_ < 1 - s^. Hence with k = '"Eq"'' ~ 1 

Pr{5„ > V?|z)j..,-_i.i;) ^ Pr jr,a-( V^) > kb] < Prj7-,a(V?) > kb] < Pr j/V_( VF) <n_\K = k] (A.59) 

and con-espondingly Pr{5„ < - | < Pr{/V+(- ^/t) < n^\K = k]. But, £(N^,\K = k) is Bin(n - 

k,p^^) for 

p_(0 = Pr(iA(^"'- VF)<^(l-£l))=, pAt) = Pr mx'" + ^ft)>ba -El)) (A.60) 

That is, pjt) = F(^ft + p+(f) = - B^) where F = 1 - F and 

B_ :=inf{v|iACv)>(l-ei)*), B+ := sup [v | i/rtv) < (1 - ei)fo) (A.61) 

If we abbreviate m = n-k, m± = Pr = (1 ~ P+(0) V P-(f)- in the binomial probabilities in jA.59^ , we 
obtain ('j) < 2", ; = 0, . . . m±, and P-(0, (1 - P+(f)) ^ 1, so that 

supPr{|5„| > V?|z),j,^^,j) < „2"p!"'-f"-^'"-M (A.62) 
k 

But by |a37J, l-eg-(£_ Ve+) =: o- > 0, sofn-(m_ Vm!|.) > o-n- 1. Now, by (6l) , forB = max|B+,-B_), 
if H is so large that Cn > (7" - B)^, 

sup r Pr{|5„| > VF |z)n,-,.,,J < n2"+' f r*™-"/^^, = exp[-gn log(n)(l - o(n''))] 

k JCii ' JCn 

for some 5' > 0. So (IV) is indeed negligible. □ 
For Proposition |6.2| we only show the first case of j^Sj; the second follows analogously. This time 
K = 0,n is fixed, and we use the inclusions of the complements in |a35J. Thus 

Pr{5„ > V?) > Pr|r„,o(V() > 0) > Pr|iV,( Vf) > «+(0)| 

Let p+ = f ( + B+). To (5 > there is a T > such that for f > T and p"+ > 1 - i5. Hence for t > and 
n' = H1+ + 1 

Pr{S„ > VF) > (j,j(l - p.)"'pr'"' > (",j(l - <5)F( VF + BJ'' 

Now by the first half of j6.5) , for d = l/n' and some c > Q. T' > T and for all t > T' 

f""'(l-F(?))>c ^ (1 -f(0)"' >c"'r' (A.63) 
Then for the M-estimator S „ , 

Ef[(S„);] > J^^^ Prj5„> VF)dr> J^^^^(^"j(l-<5)c"'(VF +B+)-'df = oo (A.64) 

□ 
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A. 5 Proof of Proposition |63] 

For ; > log(n)/n, we consider the following inclusion 

{ip(x - VF) > - Co/ Vn) = > VF + B„} c {x > vo y/log(n)/n + B„] 

Let At, ■■= { E,: [/,=! - Vf ) < (/t - l)(b - Co/ V" ))■ Hence if f > V5 log(«)/n, by j6|6), for all 
k > (1 - 6)r^, 

VriAkj \K = k)>po (A.65) 

Now we proceed as in section [a3] and even with restriction jA.65) the arguments of subsecti on |A.3| 
remain in force, so that we have to maximize fi. But t > v^log(«)/« <=^ s > yjlogn in )A.37) . 
Hence on the event , for s e [ ^logn; bl,Jvo), we get the bound < (k^ - l){b - cq/ V")/ V^> while 
for i 6 (—blnlvo; ■\jlog n) respe ctively on "^Ai, ,, we bound by k'^b. Integrating out these two i-domains 
we obtain forA,, = n (mSE(5,„ Q°\K = k)- MSE(S,„ QI \K = k)) 



separately as in subsection 



A.3 



> PO 



IvQsDnik) + 2kbD„(k) - D„(kf)ifi(s) ds + o(n"') 



for D„{k) = kcfil Vn + bl ^fn + o(l/ But for < a\ < ai < txi, ip(ai)/a2 — (p(U2)/02 ^ f^^ <pi'<)ds, 
that with a[ = ^log n, = bl„/vo, and as <fi{a2) = o(n"'). 



A, > ^[2voD„(k) - 2^^^ifi^] + o(„-') = ^ A,W + o(n-l) 



Jinn 



Now the restriction to (1 - S)r < K <k\r V« by Lemma 
fo) + o(«-'). 



A.l 



may be dropped, giving zl„ > ^''^ (rco - 



B Auxiliary Results 

B. 1 Two Hoeffding Bounds 

Lemma B.l Let^j '~ F, i = \, . . . ,n be real-valued random variables, < M Then for e > 

Proof Hoeffding 1121 Thm. 2. and Thm. 1, inequality (2.1)]. □ 



B.2 A uniform Edge worth expansion 

In the following theorem, generalizes Ibragimov (15 Thm. 1] and Ibragimov and Linnik 1161 Thm. 3.3.1] 
to the situation where the law of f , depends through an additional parameter t: 

Theorem B.2 For some set c R and fixed t e let = 1, 2, ... foe a sequence of i.i.d. real-valued 
random variables with distribution F, and with 

Ef,, = 0, E^,?, = 1, E^,?,=p„ ^^„-^ = K, (B.2) 

Let 0{s) and tfi(s) be the c.d.f. and p.d.f. of N(0, 1) and 

F„(s,t) := P(E"-i f,, < .V V^), H„(s,t) := 0{s) - ^p(.s)^(/ - 1) (B.3) 

V« 

G„(sj) := H„(s,t) ~ — [^(.v' - 3.S) + ^(/ - lOr' + 15.v)l (B.4) 
n 1 24 72 J 

Let f, be the characteristic function ofF,. 
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(a) //sup, K, < CO and if there is some hq > such that for all u\ the "no-lattice" -condition (C)' 

fu„{ui):= sup sup !/,(«)!< 1 (B.5) 

UQ<U<lt\ I 

is fulfilled, then 



(b) /Tsup, E |^,,rl' < °° 'ff^ '/Jfi uniform Cramer-condition ( C) 



sup sup \F„(s, t) - H„(s, t)\ = o(n""^) (B.6) 



limsupsup|/,(u)| < 1 (B.7) 

w->oo r 

is fulfilled, then 

sup sup \F„(s, t) - G„(s, t)\ = 0(n-3/2) (B.8) 

set. I 

Proof The general technique to prove Edgeworth expansions is to use Berry's smoothing lemma, which 
we take from Ibragimov and Linnik |16 Thm. 1.5.2] and apply it to our case: Let be the charac- 
teristic function of F„{-,t), and define the Edgeworth measures Gnj,t, j = 1,2 as G„_i_;(i) = H„(s,t), 
Gn,2,r(-*') = Gi,{s,t) as well as their Fourier-Stieltjes transforms g„ j,{u) = J e""'G'^^ j^is) A{ds) and G'^^ j = 
sup, supjgjj |G^ y, (■*)!■ Then for T > T' > 0, it holds that 

IP. A 1 r' !/"■'(")" , , 1 r \fn.M\ , . 
supsup|f„(.v,0-G„j.,(.v)| < sup- — i(di() + sup- — — — A(du) -^■ 

seR t t J-T' l«l I Jt'<IuI<T I«I 

+ sup- — — i(rfM) + sup— : (B.9) 

f " Jt'<\u\<t m t nT 

But similarly as in Ibragimov (15 pp. 462], for some constants y > and Cj > 0, we get for T' = y^fii 
and < T' 

\fn.,{u) - g„..,,(»)| ^ ^^^^^ ^^3^^.^ ^^^^^j ^ ^^p^3^^ ^_„2^4 

l«l f 

and hence the first summand in the RHS of l |B.9) is 0(n"*^+'"2). For the second summand, we note that 
fn,t(u) = f"(ul y/n) and hence 



A(du) = A(du) (B.ll) 



In case j = 2, for y sufficiently large, by condition (C), sup, sup[,,|^^ l/r(«)l =: /? < 1 and hence, for 
T = «3/2, 

sup f !;^J^4rfH)<log(r/V^)y3" =o(e-^/2) (B12) 



In case j = 1, we proceed as in Ibragimov and Linnik II6I Lemma 3.3.1]: If sup„| fy{u\) < I for y 
sufficiently large, we may proceed as in case j = 2; else, (C) says that for y sufficiently large, fy(ui) is 
isotone in ui and tends to 1. So we may define 

/;, :=inf|H, |/^(»l)>I-l/V^l (B.13) 
Setting T = 'Jn l„ for /„ = min(Z^, ^fn), we see that /,7' = o(n'') and 

sup f ^llh!^ A(du) < log( Vn)( 1 - 1 / Vn)" < log( V«)e" ^ = o(e" ) (B . 1 4) 

f Jr " 



Hence the second summand in the the RHS of jB.9) is 0(n 'J+')/2) Also, it is easy to see that G'^ ^ < oo, 
and hence by the choice of T, the last summand in the the RHS of jB.9( is 0(/„^'«"''') = o(n"''2) Q^iie. 
j = 1, and 0(«"^'~) for j = 2. Finally, by Mill's ratio, the third summand is again easily shown to be 
0(exp(-72n/3)). □ 
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B.3 Moments for the Binomial 

Lemma B.3 Let X ~ Bin(«, p). Then 

E[X]=pn, E[X-]= p^n^ + pn- p^n, (B.15) 

E[X^] = p^r? - 3p^n^ + Ip^n + 3p^n^ - 3p-« + pn, (B.16) 
E[X*} = p'*n'* - 6p*n^ + ll/n" - 6p*n + 6p^n^ - l&p^n^ + Up^n + Ip^n^ - Ip^n + pn (B.17) 

and consequentially, for p = r/ Vn, 

E[X] = m"^, E[X^] = i^n + rn^'^-r^, (B.18) 

E[X^] = r^n^l^ + 3r^n + (r - 3r'')n"2 - 3r^ + Ir^ji""^, (B.19) 

E[X'^] = r'^n^ +6r^n^'- + ar -6/)n + (r- 18r')n"2 + ll/ -Tr^ + 12r^r'/2 - (B.20) 

Proof easy calculations for MAPLE — see procedure Binmoment. . . □ 



B.4 Decay of the standard normal 

Finally, we note the following Lemma for A^(0, 1) variables 

Lemma B.4 Let X ~ N{0, 1). Then for < k < & and any sequence (c„)„ c R with liminf,, c„ > V2, 

E[X'-I,^ c— J = o(«"') (B.21) 

Proo/ Ruckdeschel [25i Lem. A.6]. □ 
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